{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "MLAssignment.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "id": "BZ-bXNuO_OyT"
      },
      "source": [
        "import pandas as pd\n",
        "import numpy as np\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.naive_bayes import GaussianNB\n",
        "from sklearn import metrics"
      ],
      "execution_count": 36,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fE4_SowlBrQV"
      },
      "source": [
        "dataset = pd.read_csv('sobar-72.csv')\n",
        "\n",
        "#Split to x and y\n",
        "\n",
        "y = dataset[\"ca_cervix\"]\n",
        "X = dataset.drop(\"ca_cervix\", 1)\n",
        "\n",
        "#filter the X to the different features\n",
        "x_behavior = dataset.filter(regex='behavior_')\n",
        "x_intention = dataset.filter(regex='intention_')\n",
        "x_attitude = dataset.filter(regex='attitude_')\n",
        "x_norm = dataset.filter(regex='norm_')\n",
        "x_perception = dataset.filter(regex='perception')\n",
        "x_motivation = dataset.filter(regex='motivation')\n",
        "x_socialSupport = dataset.filter(regex='socialSupport')\n",
        "x_empowerment = dataset.filter(regex='empowerment')\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cgjOou0ykm7A"
      },
      "source": [
        "#Behavior"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8EzAEMpShjer",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3bfd567d-d1cc-4aca-f815-a515565a3869"
      },
      "source": [
        "#Classifier for behavior features\n",
        "#Will be using Gaussian Naive Bayes classifier\n",
        "x_train, x_test, y_train, y_test = train_test_split(x_behavior, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.7333333333333333\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.67      1.00      0.80         8\n",
            "           1       1.00      0.43      0.60         7\n",
            "\n",
            "    accuracy                           0.73        15\n",
            "   macro avg       0.83      0.71      0.70        15\n",
            "weighted avg       0.82      0.73      0.71        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5t8WTnsdkjlW"
      },
      "source": [
        "#Intention"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bG1pvagXkf3u",
        "outputId": "6d133e1c-98ee-4981-b872-437d814cd474"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_intention, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.6666666666666666\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.64      0.88      0.74         8\n",
            "           1       0.75      0.43      0.55         7\n",
            "\n",
            "    accuracy                           0.67        15\n",
            "   macro avg       0.69      0.65      0.64        15\n",
            "weighted avg       0.69      0.67      0.65        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ANoBrG_nlFy4"
      },
      "source": [
        "#Attitude"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "xIxqowb2lJDR",
        "outputId": "375ebcfe-1186-459d-984b-62d39c96a63c"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_attitude, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.5333333333333333\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.53      1.00      0.70         8\n",
            "           1       0.00      0.00      0.00         7\n",
            "\n",
            "    accuracy                           0.53        15\n",
            "   macro avg       0.27      0.50      0.35        15\n",
            "weighted avg       0.28      0.53      0.37        15\n",
            "\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
            "  _warn_prf(average, modifier, msg_start, len(result))\n"
          ],
          "name": "stderr"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JU9k4xwq2dv7"
      },
      "source": [
        "#Norm"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cuJGFNLdlgcd",
        "outputId": "c126cc3d-45b2-4cab-86b3-3efe2b29823a"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_norm, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.6666666666666666\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.67      0.75      0.71         8\n",
            "           1       0.67      0.57      0.62         7\n",
            "\n",
            "    accuracy                           0.67        15\n",
            "   macro avg       0.67      0.66      0.66        15\n",
            "weighted avg       0.67      0.67      0.66        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z3P3_H8I2BYx"
      },
      "source": [
        "#Perception"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NT_QTxYnmtvr",
        "outputId": "0721b601-4086-4993-db31-0e5beb3269ec"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_perception, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.8\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.78      0.88      0.82         8\n",
            "           1       0.83      0.71      0.77         7\n",
            "\n",
            "    accuracy                           0.80        15\n",
            "   macro avg       0.81      0.79      0.80        15\n",
            "weighted avg       0.80      0.80      0.80        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bUaRGARt2D-m"
      },
      "source": [
        "#Motivation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "R8NPlIlbnIAE",
        "outputId": "be426ad8-66a3-4869-e724-23f6ebbdcc22"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_motivation, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.7333333333333333\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.67      1.00      0.80         8\n",
            "           1       1.00      0.43      0.60         7\n",
            "\n",
            "    accuracy                           0.73        15\n",
            "   macro avg       0.83      0.71      0.70        15\n",
            "weighted avg       0.82      0.73      0.71        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yskZntVt2GVA"
      },
      "source": [
        "#Social Support"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jxwoOg3YnSd9",
        "outputId": "150ec548-0786-42cd-ea31-58bac1073671"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_socialSupport, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.6\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.58      0.88      0.70         8\n",
            "           1       0.67      0.29      0.40         7\n",
            "\n",
            "    accuracy                           0.60        15\n",
            "   macro avg       0.62      0.58      0.55        15\n",
            "weighted avg       0.62      0.60      0.56        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i_c9zMME2Ip-"
      },
      "source": [
        "#Empowerment"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "qfGval8_na05",
        "outputId": "9d5d3cd6-1aea-404f-d7ee-efa1025791ca"
      },
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(x_empowerment, y, test_size=0.2, random_state=42)\n",
        "\n",
        "nb = GaussianNB()\n",
        "\n",
        "nb.fit(x_train, y_train)\n",
        "\n",
        "y_pred = nb.predict(x_test)\n",
        "\n",
        "print(\"Accuracy: \",metrics.accuracy_score(y_test, y_pred))\n",
        "print(metrics.classification_report(y_test, y_pred))"
      ],
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Accuracy:  0.7333333333333333\n",
            "              precision    recall  f1-score   support\n",
            "\n",
            "           0       0.70      0.88      0.78         8\n",
            "           1       0.80      0.57      0.67         7\n",
            "\n",
            "    accuracy                           0.73        15\n",
            "   macro avg       0.75      0.72      0.72        15\n",
            "weighted avg       0.75      0.73      0.73        15\n",
            "\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JIIrUmET1JZs"
      },
      "source": [
        "All the classifirs above are GaussianNaive Bayes classifiers. Each classifier runs on data from a different feature from the behaviour risk cervical cancer dataset. These unique features are: behavior, intention, attitude, norm, perception, motivation, social suppport and empowerment. From the above runs, the feature with the best F1 accuracy score is **Perception** with an accuracy score of 0.8. Meaning the classifier predicted correctly the presence of cervical cancer in patients 80% of the time.\n",
        "Behaviour, motivation and empowerment features all had an accuracy of 73%, meaning they are features with similar characteristics and can be utilised by the classifier interchangeably, same case for norm and intention with accuracy scores of 66%. The worst performers are Social Support with 60% and attitude with 53%. \n",
        "This also shows that there are 4 distinct features in the data that can be utilized by classification algorithms."
      ]
    }
  ]
}